lipschitz constant
Beyond Lipschitz: Data-Driven Robustness via Discrete Modulus of Continuity
Dölz, Jürgen, Multerer, Michael, Palma, Michele
Robustness of neural networks is commonly quantified via local or global Lipschitz constants. However, Lipschitz continuity can be overly coarse or overly restrictive as global robustness measure, failing to capture nuanced, data-dependent behavior. We propose a data-driven, architecture-agnostic framework based on the discrete modulus of continuity (DMOC), a non linear generalization of Lipschitz continuity that provides a finer notion of robustness. Unlike many existing approaches, DMOC does not require access to model internals and instead evaluates regularity relative to the data distribution. This shifts the focus from the model to the data, which provide a data-driven baseline of regularity against which the network's robustness is assessed. We establish convergence results for DMOC-induced seminorms with explicit data-driven rates in terms of the separation distance, and introduce a scalable minibatch algorithm that reduces the quadratic cost of exact computation, enabling application to large-scale data sets such as ImageNet. Empirically, DMOC serves as an architecture independent diagnostic: it distinguishes trained from untrained networks, reveals underfitting and overfitting regimes, and yields, as a special case, tight Lipschitz estimates comparable to state-of-the-art method such as ECLipsE and ECLipsE-fast.
Why Does Agentic Safety Fail to Generalize Across Tasks?
Slutzky, Yonatan, Alexander, Yotam, Slor, Tomer, Nagel, Yoav, Cohen, Nadav
AI agents are increasingly deployed in multi-task settings, where the task to perform is specified at test time, and the agent must generalize to unseen tasks. A major concern in such settings is safety: often, an agent must not only execute unseen tasks, but do so while avoiding risks and handling ones that materialize. Empirical evidence suggests that even when the ability to execute generalizes to unseen tasks, the ability to do so safely frequently does not. This paper provides theory and experiments indicating that failures of agentic safety to generalize across tasks are not merely due to limitations of training methods, but reflect an inherent property of safety itself: the relationship between a task and its safe execution is more complex than the relationship between a task and its execution alone. Theoretically, we analyze linear-quadratic control with $H_{\infty}$-robustness, and prove that the mapping from task specification to an optimal controller has higher Lipschitz constant with safety requirements than without, yielding a Lipschitz bound of independent interest. Empirically, we demonstrate our conclusions in simulated quadcopter navigation with a neural network agent and in CRM with an LLM agent. Our findings suggest that current efforts to enhance agentic safety may be insufficient, and point to a need for fundamentally different approaches.
Reliable Estimation of KLDivergence using a Discriminator in Reproducing Kernel Hilbert Space Supplementary Material
Organization: This supplementary material is presented in a format parallel to the main paper. The section numbers and titles are consistent with the main paper. But, here we also add one new section: Section 10 where we describe the societal impacts and possible negative impacts of the paper. Similarly, the Theorem numbers are consistent with the main paper, but we also have several additional theorems and lemmas which were not included in the main paper. GAN-type Objective for KLEstimation Let f be a discriminator, f: X IR. Let p(x) and q(x) be two probability density functions defined over the space X.
Appendix Impact
The SC stands for the spectral complexity defined in [4]. We use the empirical estimation of k-variance and Lipschitz constant defined in section 5 to calculate kV-Margin and kV-GN-Margin. B.2 Variance of Empirical Estimation In Table 1, we show the average scores over 4 random sampled subsets. We now show the standard deviation in Table 4. Overall, the standard deviation of the estimation is fairly small, consistent to the observation in Theorem 7.
Measuring Generalization with Optimal Transport
Understanding the generalization of deep neural networks is one of the most important tasks in deep learning. Although much progress has been made, theoretical error bounds still often behave disparately from empirical observations. In this work, we develop margin-based generalization bounds, where the margins are normalized with optimal transport costs between independent random subsets sampled from the training distribution. In particular, the optimal transport cost can be interpreted as a generalization of variance which captures the structural properties of the learned feature space. Our bounds robustly predict the generalization error, given training data and network parameters, on large scale datasets. Theoretically, we demonstrate that the concentration and separation of features play crucial roles in generalization, supporting empirical results in the literature.
1102a326d5f7c9e04fc3c89d0ede88c9-Supplemental.pdf
This is the distribution over datasets one obtains by first sampling a task t from Pt, and then sampling a dataset S from Pmz|t. Here p(S) corresponds to the marginal distribution over datasets S. Note that the last line above holds because E P f(,S) does not depend on t. Thus, in this section, we present a specialization of the bound for Gaussian distributions. Let P have mean µ and covariance; thus P = N(µ,) and analogously P,0 = N(µ0, 0). We can then apply the analytical form for the KL-divergence between two multivariate Gaussian distributions to the bound presented in Theorem 3. The result is the following bound holding under the same assumptions as Theorem 3: L(P,Pt) 1 l We implement the above bound in code instead of the non-specialized form of the KL divergence to speed up computations and simplify gradient computations. A.3.2 Few-Shot Learning Bound with Validation Data In this section, we will assume that, in addition to the training data S Pmz|t, we have access to validation data Sva Pnz|t at meta-training time. We will show that a meta-learning generalization bound can still be obtained in this case.